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ABSTRACT 

Long-term precise timing of Galactic millisecond pulsars holds great promise for mea- 
suring the long-period (months-to-years) astrophysical gravitational waves. Several 
gravitational- wave observational programs, called Pulsar Timing Arrays (PTA), are 
being pursued around the world. 

Here we develop a Bayesian algorithm for measuring the stochastic gravitational- 
wave background (GWB) from the PTA data. Our algorithm has several strengths: 
(1) It analyses the data without any loss of information, (2) It trivially removes sys- 
tematic errors of known functional form, including quadratic pulsar spin-down, annual 
modulations and jumps due to a change of equipment, (3) It measures simultaneously 
both the amplitude and the slope of the GWB spectrum, (4) It can deal with unevenly 
sampled data and coloured pulsar noise spectra. We sample the likelihood function 
using Markov Chain Monte Carlo (MCMC) simulations. We extensively test our ap- 
proach on mock PTA datasets, and find that the algorithm has significant benefits 
over currently proposed counterparts. We show the importance of characterising all 
red noise components in pulsar timing noise by demonstrating that the presence of a 
red component would significantly hinder a detection of the GWB 

Lastly, we explore the dependence of the signal-to-noise ratio on the duration of 
the experiment, number of monitored pulsars, and the magnitude of the pulsar timing 
noise. These parameter studies will help formulate observing strategies for the PTA 
experiments. 

Key words: gravitational waves - methods: data analysis - pulsars: general 



1 INTRODUCTION 

At the time of this writing several large projects are be- 
ing pursued in order to directly detect astrophysical grav- 
itational waves. This paper concerns a program to detect 
gravitational waves using pulsars as nearly-perfect Einstein 
clocks. The practical idea is to time a set of millisecond 
pulsars (called the " Pulsar Timing Array" , or PTA) over 
a number of years (iFoster fc Backerl Il990l ). Some of the 
millisecond pulsars create pulse trains of exceptional reg- 
ularity. By perturbing the space-time between a pulsar 
and the Earth, the gravitational waves (GWs) will cause 
extra deviations from the periodicity in the pulse arrival 
times l|Estabrook fc Wahlguistll 19751 : ISazhinll 19781 : [Petweilerl 
1 19791 ). Thus from the measurements of these deviations 
(called "timing-residuals", or TR), one may measure the 
gravitational waves. Currently, several PTA project are op- 
erating around the globe. Firstly, at the Arecibo Radio 
Telescope in North- America several millisecond pulsars have 



been timed for a number of years. These observations have 
already been used to place interesting upper limits on the in- 
tensity o f gravitational waves which are pas sing through the 
Galaxy (|Kaspi et all 1 1994 Ibommenl l200lh . Together with 
the Green Bank Telescope, the Arecibo Radio Telescope 
will be used as an instrument of NANOGrav, the North 
American PTA. Secondly, the European PTA is being set 
up as an international collaboration between Great Britain, 
France, Netherlands, Germany, and Italy, and will use 5 Eu- 
rope an radio telescopes to monitor about 20 millisecond pul- 
sars (jStappers et al.ll2006l '). Finally, the Parkes PTA in Aus- 
tralia has been using the Parkes multi-beam radio-telescope 
to monitor 20 millisecond pulsars ( Mancheste r 20_06). Some 
of the Parkes and Arecibo data have also been u sed to place 
the most stringent limits on the GWB to date (|jenet et al.l 
'2006). 

One of the main astrophysical targets of the PTAs 
is the stochastic background of the gravitational waves 
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(GWB). This GWB is thought to be generated by a large 
number of black-hole binaries whi ch are thought to be 
located at th e centres of g alaxies (Bcaclman et al. 'l980'; 
PhinnevI I2001I: I Jaffe fc Backer .200a : 



. . . .Jaffe fc Backer .200a : .Wyithe fc Loeb 2003,; 

Sesana et al.ll2005^ ■ by relic grav tational waves l|Grishchukl 
20051 ). or, more speculatively, by cusps in the cosmic-string 



loops l|Damour fc Vilenkinll2005l ). This paper develops an al- 
gorithm for the optimal PTA measurement of such a GWB. 

The main difficulty of such a measurement is that not 
only Gravitational Waves create the pulsar timing-residuals. 
Irregularities of the pulsar-beam rotation (called the "tim- 
ing noise" ) , the receiver noise, the imprecision of local clocks , 
the polarisation calibration of the telescope (|Brittonll2000h . 
and the variation in the refractive index of the interstellar 
medium all contribute significantly to the timing-residuals, 
making it a challenge to separate these noise sources from 
the gravitational-wave signal. However, the GWB is ex- 
pected to induce correlations between the timing-residuals 
of different pulsars. These correlations are of a specific func- 
tional form [given by Eq. ([9)l below] , whi ch is different from 
those i ntroduced b y other noise sources (iHellings fc Down3 
1 19831 ). iJenet et al.l (|2005l . hereafter JOS) have invented a 
clever algorithm which uses the uniqueness of the GWB- 
induced correlations to separate the GWB from other noise 
sources, and thus to measure the magnitude of the GWB. 
Their idea was to measure the timing-residual correlations 
for all pairs of the PTA pulsars, and check how these corre- 
lations depend on the sky-angles between the pulsar pairs. 
JOS have derived a statistic which is sensitive to the func- 
tional form of the GWB-induced correlation; by measuring 
the value of this statistic one can infer the strength of the 
GWB. While JOS algorithm appears robust, we believe that 
in its current form it does have some drawbacks, in partic- 
ular: 

(1) The statistic used by JOS is non-linear and non-quadratic 
in the pulsar-timing-residuals, which makes its statistical 
properties non-transparent. 

(2) The pulsar pairs with the high and low intrinsic timing 
noise make equal contributions to the JOS statistic, which is 
clearly not optimal. 

(3) The JOS statistic assumes that the timing-residuals of all 
the PTA pulsars are measured during each observing run, 
which is generally not the case. 

(4) The JOS signal-to-noise analysis relies on the prior knowl- 
edge of the intrinsic timing noise, and there is no clean way 
to separate this timing noise from the GWB. 

(5) The prior spectral information on GWB is used for 
whitening the signal; however, there is no proof that this 
is an optimal procedure. The spectral slope of the GWB is 
not measured. 

In this paper we develop an algorithm which addresses 
all of the problems outlined above. Our method is based on 
essentially the same idea as that of JOS: we use the unique 
character of the GWB-induced correlations to measure the 
intensity of the GWB. The algorithm we develop below is 
Bayesian, and by construction uses optimally all of the avail- 
able information. Moreover, it deals correctly and efficiently 
with all systematic contributions to the timing-residuals 
which have a known functional form, i.e. the quadratic pul- 
sar spin-downs, annual variations, one-time discontinuities 
(jumps) due to equipment change, etc. Many parameters of 
the timing model (the model popular pulsar timing packages 



use to generate TRs from pulsar arrival times) fall in this 
category. 

The plan of the paper is as follows. In the next sec- 
tion we review the theory of the GWB-generated timing 
residuals and introduce our model for other contributions 
to the timing residuals. In Sec. [3] we explain the principle 
of Bayesian analysis for GWB-measurement with a PTA, 
and we evaluate the Bayesian likelihood function. There we 
also show how to analytically marginalise over the contri- 
butions of known functional form but unknown amplitude 
(i.e., annual variations, quadratic residuals due to pulsar 
spin-down, etc.). The details of this calculation are laid out 
in Appendix A. Section U discusses the numerical integra- 
tion technique which we use in our likelihood analysis: the 
Markov Chain Monte Carlo (MCMC). In Sec. [S] we show 
the analyses of mock PTA datasets. For each mock dataset, 
we construct the probability distribution for the intensity 
of the GWB, and demonstrate its consistency with the in- 
put mock data parameters. We study the sensitivity of our 
algorithm for different PTA configurations, and investigate 
the dependence of the signal-to-noise ratio on the duration 
of the experiment, on redness and magnitude of the pulsar 
timing noise, and on the number of clocked pulsars. In Sec. [6] 
we summarise our results. 



2 THE THEORY OF GW-GENERATED 
TIMING-RESIDUALS 

2.1 Timing residual correlation 

The measured millisecond-pulsar timing-residuals contain 
contributions from several stochastic and deterministic pro- 
cesses. The latter include the gradual deceleration of the 
pulsar spin, resulting in a pulsar rotational period derivative 
which induces timing residuals varying quadratically with 
time (hereafter referred to as "quadratic spin-down"), the 
annual variations due to the imperfect knowledge of the pul- 
sar positions on the sky, the ephemeris variations caused by 
the known planets in the solar system, and the jumps due to 
equipment change (Manchester 2006). The stochastic com- 
ponent of the timing-residuals will be caused by the receiver 
noise, clock noise, intrinsic timing noise, the refractive index 
ffuctuations in the interstellar medium, and, most impor- 
tantly for us, the GWB. For the purposes of this paper we 
restrict ourselves to considering the quadratic spin-downs, 
intrinsic timing noise, and the GWB; other components can 
be similarly included, but we omit them for mathematical 
simplicity. In this case, the i"" timing residual of the a**" 
pulsar can be written as 



Stai 



' Stai^ + St™ + Q{tai), 



(1) 



where 5t^^ and St^f are caused by the GWB and the pulsar 
timing noise, respectively, and 



-j.{tai) = Aal + Aa2tai + Aagta 



(2) 



represent the quadratic spin-down. One expects the timing 
noise from different pulsars to be uncorrelated, while the 
GWB will cause correlations in the timing-residuals between 
different pulsars. Therefore, the information about GWB 
can be extracted by correlating the timing residual data be- 
tween the different pulsars (JOS). If one assumes that both 
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GWB-generated residuals and the intrinsic timing noise are 
stochastic Gaussian processes, then we can represent them 
by the (n x n) coherence matrices: 

/ r.GW c,GW\ _ ^GW 



/r.PNr.PNi 



PN 

(ai)(bj) 1 



with the total coherence matrix given by 



(3) 



(4) 



The timing-residuals are then distributed as a multidimen- 
sional Gaussian: 



{St) = 



^(27r)"detC 



exp 



C, 



(ai)(bj) 



\ ^ (5i{ai) — Qaitai)) 
b{tbj))] , (5) 

where P denotes the probability distribution of the timing- 
residuals. To be able to use Eq. ([Sjl we must 

(1) be able to evaluate the GWB-induced coherence matrix 
from the theory, as a function of variables that parametrise 
the GWB spectrum, and 

(2) introduce well-motivated parametrization of the pul- 
sar timing noise. In this work, the spectral density of the 
stochastic GW background is taken to be a power law 
l|Phinnev|l200ll : Ijaffe fc Backeill2003l : IWvithe fc Loebll2003l : 



iMaggiore 



20001 ') 



yr- 



(6) 



where Sh represents the spectral density, A is the GW am- 
plitude, / is the GW frequency, and 7 is an exponent char- 
acterising the GWB spectrum. If the GWB is dominated by 
the supermassive black hole binaries, then 7 — 7/3 (Phin- 
ney 2001). This definition is eq uivalent to t he use of the 
characteristic strain as defined in ljenet et al. I (|2006i): 



hc = A 



f 



yr- 



(7) 



with 7 = 1 — 2a. The GWB-induced coherence matrix is 
then given by 



GW 
(ai)(bj) 



"Em 



^{r(-l-7)sin(^) iUr) 



7+1 



(2n)!(2n-l-7) 



(8) 



Here aab is the geometric factor given by 



3 1 ~ cos 9ab , /I — cos 9ab 

a.b^- ln( 



The pulsar timing noise is assumed to be Gaussian, with 
a certain functional form of the power spectrum. The true 
profile of the millisecond pulsar timing noise spectrum is not 
well-known at present time. The timing residuals of the most 
precisely observed pulsars indicate that pulsar timing noise 
has a white and poorly-constrained red component (J. Ver- 
biest and G. Hobbs, private communications). 

For the purposes of this paper we will always choose 
the spectra to be of the same functional form for all pulsars, 
but this is not an inherent limitation of the algorithm. We 
consider 3 cases of pulsar timing noise spectra: 

(1) White (flat) spectra 

(2) Lorentzian spectra 

(3) Power-law spectra 

Obviously, one could also consider a timing noise which is 
a superposition of these components; we do not do this at 
this exploratory stage. If we choose the pulsar timing noise 
spectrum to be white, with an amplitude Na, the resulting 
correlation matrix becomes: 



^PN — white Ar2 r r 



1 1 - cos ggj, 1 1 

-4 ^ + 2 + 2^«^'(9) 



where Ogi, is the angle between pulsar a and pulsar 6 
ijHellings fc Downsll 19831 ). r = 27r {tai - Uj), T is the gamma 
function, and Jl is the low cut-off frequency, chosen so that 
1//l is much greater than the duration of the PTA opera- 
tion. Introducing /_l is a mathematical necessity, since other- 
wise the GWB-induced correlation function would diverge. 
However, we show below that the low-frequency part of the 
GWB is indistinguishable from an extra spin-down of all 
pulsars which we already correct for, and that our results 
do not depend on the choice of /l provided that /lt ^ 1. 



(ai)(bj) — J»a"a()"ij- (10) 

The Lorentzian spectrum is a red spectrum with a typ- 
ical frequency that determines the redness of the timing 



Saif) 



N 



(11) 



/o(i+(i)T 

which yields the foUowing correlation matrix: 

/-yPN — lor 7»t2 c- / r \ 

<-(ai)(i>j) = Ng5ateX.p{-foT), 

where /o is a typical frequency and A'^^ is the amplitude. 

By using a power law spectral density with amplitude 
Na and spectral index 7^ , one gets a timing-noise coherence 
matrix analogous to the one in Eq. ([8]): 



(12) 



^PN-pl 
'~'(ai){bj) 



|r(l - 7„) sm — j (/, 



,7a-l 



(2n)!(2n + l-7a) 



(13) 



3 BAYESIAN APPROACH 
3.1 Basic ideas 

The method described in this report is based upon a 
Bayesian approach to the parameter inference. The general 
idea of the method is to (a) assume that the physical 
processes which produce the timing-residuals can be char- 
acterised by several parameters, and (b) use the Bayes 
theorem to derive from the measured data the probability 
distribution of the parameters of our interest. In our case, 
we assume that the timing residuals are created by 

(1) the GWB; we parametrise it by its amplitude A and 
slope 7, as in equation (|6]). 

(2) the intrinsic timing noise of the 20 monitored millisec- 
ond pulsars. We assume that the timing noise of each of 
the pulsars is the random Gaussian noise, with a variety 
of possible spectra described in the previous section. We 
shall refer to the variables parametrizing the timing noise 
spectral shape as TNa- 
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(3). The quadratic spin-downs, parametrised for each of the 
pulsars by Aai, Aa2, and Aas, cf. Eq. ([2)|. 

With these assumptions, we shall write down be- 
low the expression for the probability distribution 
P(data|parameters) of the data, as a function of the param- 
eters. By Bayes theorem, we can then compute the posterior 
distribution function; the probability distribution of the pa- 
rameters given a certain dataset: 



P(parameters|data) = 



P(data|parameters) x 
Po (parameters) 
P(data) ' 



(14) 



Here Po (parameters) is the prior probability of the un- 
known parameters, which represents all our current knowl- 
edge about these parameters, and P(data) is the Bayesian 
evidence, which we will use here as a normalisation factor 
to ensure that P{A,'y,TNa, Aai, Aa2, Aa3\da,ta,) integrates 
to unity over the parameter space. We note here that the 
Bayesian evidence is in essence a goodness of fit measure 
that can be used for model selection. Ifowever, we will ig- 
nore the Bayesian evidence in this work and postpone the 
model selection part of the algorithm to future work. For our 
purposes, we are only interested in A and 7, which means 
that we have to integrate P{A,'Y,TNa, Aai, Aa2, Aa3\dsLta) 
over all of the other parameters. Luckily, as we show below, 
for a uniform prior the integration over Aai, Aa2, and Aas 
can be performed analytically. This amounts to the removal 
of the quadratic spin-down component to the pulsar data. 
We emphasise that this removal technique is quite general, 
and can be readily applied to unwanted signal of any known 
functional form (i.e., annual modulations, jumps, etc. — see 
Sec. I3.2|l , even if those parameters have already been fit for 
while calculating the timing residuals. The integration over 
TNa must be performed numerically. 

In this work we shall use MCMC simulation as a multi 
dimensional integration technique. Besides fiat priors for 
most of the parameters, we will use slightly peaked priors 
for parameters which have non-normalisable likelihood func- 
tions. This ensures that the Markov Chain can converge. 

In the rest of the paper, we detail the implementation 
and tests of our algorithm. 



3.2 Removal of the quadratic spin-downs and 

other systematic signals of known functional 
form 

While this subsection is written with the PTA in mind, it 
may well be useful for other applications in pulsar astron- 
omy. We thus begin with a fairly general discussion, and 
then make it more specific for the PTA case. 

Consider a random Gaussian process Sxf with a coher- 
ence matrix C{a), which is contaminated by several system- 
atic signals with known functional forms fp{ti) but a-priori 
unknown amplitudes ^p. Here cr is a set of interesting pa- 
rameters which we want to determine from the data Sx. 
The resulting signal is given by 



Sxi = Sxf + ^ £,pfp{U 



(15) 



or, in the vector form, by 
5x ■ 



5x° + fC 



(16) 



Here the components of the vectors 5x, 5x , and ^ are given 
by Sxi, Sxf, and ^p, respectively, and F is the non-square 
matrix with the elements Fip = fp{ti). Note that the di- 
mensions of 5x and ^ are different. The Bayesian probability 
distribution for the parameters is given by 



Pia,^\Sx) 



M 



VdetC 



exp 



1 



{5x-FOC'\6x- 



(17) 



where Pq is the prior probability and M is the normalisa- 
tion. Since we are only interested in a, we can integrate 
P((T, (,\Sx) over the variables ^. This process is referred to as 
marginalisation; it can be done analytically if we assume a 
fiat prior for ^ [i.e., if Po(cr, ^) is ^-independent], since en- 
ter at most quadratically into the exponential above. After 
some straightforward mathematics which we have detailed 
in Appendix A, we get 



P{a\5x) 



M' 



^det(C) det(pTc-ii?) 



(18) 



C'Sx 



X exp 

where M' is the normalisation, and 
C' = - C~^F{F'^C~^F)'^f'^C~ 



(19) 



and the T-superscript stands for the transposed matrix. 
Equation psp is one of the main equations of the paper, 
since it provides a statistically rigorous way to remove (i.e., 
marginalise over) the unwanted systematic signals from ran- 
dom Gaussian processes. One can check directly that the 
above expression for P{a\5x) is insensitive to the values ^p 
of the amplitudes of the systematic signals in the Eq. (|15p . 

We now apply this formalism to account for the 
quadratic spin-downs in the PTA. As in Sec. [21 it will be con- 
venient to use the 2-index notation for the timing-residuals, 
Stai measured at the time tai, where a is the pulsar index, 
and i is the number of the timing residual measurement 
for pulsar a. The space of the spin-down parameters Aaj, 
j = 1,2,3 has 3A'^ dimensions, where A'^ is the number of 
pulsars in the array. In the component language, we write 



Stai — Stfi + F(^ai)(bj)Abj, 

where 

F(ai)(b3) = Sabtii \ 



(20) 



(21) 



St'^ is the part of the timing residual due to a random Gaus- 
sian process (i.e., GWB, timing noise, etc.), and j = 1, 2, 3. 
The quantities F^ai)(bj) are components of the matrix opera- 
tor which acts on the 3A'^-dimensional vector in the parame- 
ter space and produces a vector in the timing-residual space. 
For example, for 20 pulsars, each with 250 timing residual 
observations, the matrix P(ai){i,j) has 20 x 250 — 5000 rows, 
each marked by 2 indices a = 1,...,20, i — 1,...,250, and 
20 X 3 = 60 columns, each marked by 2 indices b = 1, 20, 
j — 1, 2, 3. Thus in the vector form, one can write Eq. (|20|l 
as 
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St = St^ + FA, (22) 

which is identical to the Eq. (|16|l . We thus can use Eq. psp 
to remove the quadratic spin-down contribution from the 
PTA data. 

Although we only demonstrate this technique for 
quadratic spin-down, this removal technique will be useful 
for treating other noise sources in the PTA. All sources of 
which the functional form is known (and therefore can be 
fit for, as most popular pulsar timing packages do) can be 
dealt with, i.e. 

(1) Annual variation of the timing-residuals due to the im- 
precise knowledge of the pulsar position on the sky. The 
annual variation in each of the pulsars will be a predictable 
function of the associated 2 small angular errors (latitude 
and longitude). Thus our parameter space will expand by 
2N, but this will still keep the F matrix manageable. 

(2) Changes of equipment will introduce extra jumps, and 
must be taken into account. This is trivial to deal with using 
the techniques described above. 

(3) Some of the millisecond pulsars are in binaries, and their 
orbital motion must be subtracted. The errors one makes in 
these subtractions will affect the timing-residuals. They can 
be parametrised and dealt with using the techniques of this 
section (we thank Jason Hessels for pointing this out). 

3.3 Low- frequency cut-off 

All predictions for GWB spectrum show a steep p ower law 
PC / where for black-hole binaries 7 — 7/3 jPhinnevI 
I2OOII ). Physically, there is a low-frequency cut-off to the 
spectrum, due to the fact that black-hole binaries with peri- 
ods greater than 1000 years shrink mostly due to the exter- 
nal friction (i.e., scattering of circum-binary stars or excita- 
tion of density waves in a circum-binary gas disc), and not 
to gravitational radiation. However, while the exact value 
of the low-frequency cut-off is poorly constrained, the PTA 
should not be sensitive to it since the duration of the cur- 
rently planned experiments is much shorter than 1000 years. 
In this subsection, we show this formally by explicitly intro- 
ducing the low-frequency cut-off and by demonstrating that 
our Bayesian probabilities are insensitive to its value. 

Consider the expression in Eq. ([Sj for the GWB- 
generated correlation matrix for the timing-residuals. This 
expression contains an integral of the form 

poo 

J= / cos(/r)/-(^+^'d/, (23) 

where t = 27r(fi — tj). When the low- frequency cut-off is 
much smaller than the inverse of the experiment duration, 
i.e. when /;r <^ 1, the integral above can be expanded as 

where 

B = r(-l-7)sin(^)r-+\ (25) 

In the expansion above we have assumed 1 < 7 < 3. The 
terms which contain /l diverge when /l goes to zero, and 
scale as r° or with respect to the time interval. We now 
show that these divergent terms get absorbed in the process 
of elimination of the quadratic spin-downs. 



Suppose that we add to the timing-residuals of a pulsar 
a quadratic spin-down term, Ai+A2t+A3t'^. The spin-down- 
removal procedure described in the previous section makes 
our results completely insensitive to this addition: A^s could 
be arbitrarily large but the measured GWB would still be 
the same. Clearly, the same is true if one treats Ai, A2, A3 
not as fixed numbers, but as random variables drawn from 
some Gaussian distribution. The correlation introduced into 
the timing-residuals by adding a random quadratic spin- 
down is given by 

(SuStj) = (Al) + {A1A2} {U + tj) 

+ 2{Al)Ut, + (AiA3)(t? + t]) (26) 

+ {A2A3)Utj{U + tj) + {Al)thl 

The /^-dependent part of Eq. (1241) contains terms which 
scale as + t'^, titj, and const, and thus have the same 
functional ti,tj dependence as some of the terms in Eq. (|26p . 
Since the terms in Eq. (|26[) can be made arbitrarily large, it 
is clear that the terms corresponding to the low-frequency 
cutoff could be absorbed into the correlation function cor- 
responding to the quadratic spin-down with the stochastic 
coefficients. We have made this argument for the timing- 
residuals from a single pulsar, but it is trivial to extend it 
to the case of multiple pulsars. Thus our results are not sen- 
sitive to the actual choice of the /l so long as fir <^ 1; this 
is confirmed by direct numerical tests. 

4 NUMERICAL INTEGRATION TECHNIQUES 
4.1 Metropolis Monte Carlo 

The Bayesian probability distribution for the PTA is com- 
puted in multi dimensional parameter space, where all of 
the parameters except 2 characterise intrinsic pulsar timing 
noise and other potential interferences. To obtain meaning- 
ful information about the GWB, we need to integrate the 
probability function over all of the unwanted parameters. 
This is a challenging numerical task: a direct numerical in- 
tegration over more than several parameters is prohibitively 
computationally expensive. Fortunately, numerical short- 
cuts do exist, and the most common among them is the 
Markov Chain Monte Carlo (MCMC) simulation. In a typi- 
cal MCMC, a set of semi-random walkers sample the param- 
eter space in a clever way, each gener ating a large number 
of seq uential locations called a chain (|Newman fc Barkemal 
After a sufficient number of steps, the density of 
points of the chain becomes proportional to the Bayesian 
probability distribution. The number of steps required for 
the chain convergence scales linearly with the number of di- 
mensions of the parameter space; typically fewxlO** steps 
are required for reliable conver g ence. In this paper we use 
the Metropolis (Newman fc Barkema 1999) algorithm for 
generating the chain, which can be used with an arbitrary 
distribution, the proposal distribution, for generating new 
locations of the chain. We use a Gaussian proposal distribu- 
tion, centred at the current location in the parameter space. 
During an initial period, the burn-in period, the width of the 
proposal distribution in all dimensional directions is set to 
yield the asymptotically opti mal acceptance rate of 23.4% 
for the Metropolis algorithm (|Roberts et al.|[l997h . At the 
end of the MCMC simulation we check the convergence of 
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the chain using the bootstrap method llEfronll 19791 ). We also 
calculate the global maximum likelihood v alue for all p aram- 
eters using a conjugate directions search ()Brentlll973h . 



4.2 Current MCMC computational cost 

The greatest computational challenge in constructing the 
chain is the fast evaluation of the matrix in 
Eqs. (|18[) fc (|19|l . If 250 timing-residuals are measured for 
each of the pulsars (50 weeks for 5 years), the size of the 
matrix C becomes (5000 x 5000). We find it takes about 20 
seconds to invert C and thus about 1.5 times as much to 
arrive to the next point in the chain. Therefore, for the re- 
quired 10^ chain points to get the convergent distribution, 
we need of order 1 month of the single-processor computa- 
tional time. On a cluster this can be done in a couple of days. 
We emphasize that this is an order process. For matrices 
of (2000 X 2000) the calculation can be done overnight on a 
single modern workstation, but for (10* x 10*) the calcula- 
tion is already a serious challenge. 

For the cur rently projected size of the datasets 
l|Manchesterll200^ . the amount of timing-residuals will most 
likely not exceed the 250 (Hobbs, private communications). 
Thus, the brute-force method presented here is not compu- 
tationally expensive for the projected data volume over the 
next 5 years. 



4.3 Choosing a suitable prior distribution 

For some models (e.g. the power law special density for 
pulsar timing noise) the likelihood function proves to be 
not normalisable. This would pose a serious problem in 
combination with uniform priors as the nuisance param- 
eters then cannot be marginalised and the posterior can- 
not represent a probability distribution. Although this is 
a sign that our model is incorrect (infinite Bayesian Evi- 
dence/normalisation), this can be easily solved with a dif- 
ferent parameterisation. We can always change coordinates 
in parameter space to a set for which all parameters have a fi- 
nite domain, which guarantees that our likelihood function is 
normalisable. This procedure is equivalent to choosing a dif- 
ferent prior (the Jacobian in the case of a coordinate trans- 
formation) for the original set. We therefore argue that we 
need to choose an appropriate prior for the non-normalisable 
parameters. We propose to use a Lorentzian shaped profile: 



Likelihood and prior distributions 



^^0(70 = 



A, 



(27) 



where 7i is the parameter for which we are construction a 
prior, and Ai is some typical width/value for this parameter. 

As an example we show the likelihood function and the 
prior for the pulsar timing noise spectral index parameter of 
Eq. (fT3)) in Fig. [T] The likelihood function seems to drop to 
zero for high 7^, but it actually has a non-negligible value 
for all 7i greater than the maximum likelihood value. The 
broadness of the prior is chosen such that it does not change 
the representation of the significant part of the likelihood in 
the posterior, but it does make sure that the posterior is 
normalisable. 



iikeiiliood function - 
prior distribution 



Puisar liming noise spectrai index y| 

Figure 1. The likelihood and prior distribution for a pulsar tim- 
ing noise spectral index parameters 7^. The solid line represents 
the likelihood function. It is sharply peaked and it looks as if it 
drops to zero for high 7^. However, for high ■ji it will have a con- 
stant non-negligible value. The dashed line represents our chosen 
prior distribution. The prior is normalisable, and it's application 
makes the posterior distribution normalisable as well. 



4.4 Generating mock data 

In order to generate mock data, we produce a realization 
of the multi dimensional Gaussian process of Eq. ([5]), as 
follows. We rewrite Eq. is a basis in which C is diagonal: 



n 



where, 



V3(x) ■- 



1 



27r 



™P — TT 



(28) 



(29) 



Here are the eigenvalues of C, and 

y = T-'st, (30) 

where T is the transformation matrix which diagonalizes C: 
{T-^CT)^, ^ X,S,j. (31) 

Thus we follow the following steps: 

(1) Diagonalize matrix C, find T and Ai. 

(2) Choose yi from random gaussian distributions of widths 
V%. 

(3) Compute the timing residuals via Eq. (|30p . 

It is then trivial to add deterministic processes, like 
quadratic spin-downs, to the simulated timing-residuals. 



5 TESTS AND PARAMETER STUDIES 

We test our algorithm by generating mock timing-residuals 
for a number of millisecond pulsars which are positioned 
randomly in the sky. We found it convenient to parametrise 
the GWB spectrum by [cf.Eq. (|6}] 



f 



(32) 
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Our mock timing-residuals are a single realisation of GWB 
for some values of A and 7 and the pulsar timing noise. 
Random quadratic-spin-down terms are added. We then 
perform several separate investigations as follows; 



5.1 Single dataset tests 

Our algorithm is tested on several datasets in the following 
way: 

The mock datasets were generated with parameters resem- 
bling an experiment of 20 pulsars, with observations approx- 
imately every 5 weeks for 5 years. The pulsar timing noise 
was set to an optimistic level of 100 ns each (rms timing 
residuals). In all cases the level of GWB has been set to 
A = IQ-^'^yr^/^ with 7 = 7/3. This level of GWB is an or- 
der of magn itude smaller than the most recent upper limits 
of this tvpe (|Jenet et al.ll2006l ). We then analyze this mock 
data using the MCMC method. In Figs[2]— [4]we see examples 
of the joint A — 7 probability distribution, obtained by these 
analyses. For each dataset we also calculate the maximum 
likelihood value of all parameters using a conjugate direc- 
tions search. The algorithm gives results consistent with the 
input parameters (i.e., they recover the amplitude and the 
slope of the GWB within measurement errors). This was 
observed in all our tests. 

For all datasets we also calculated the Fisher informa- 
tion matrix, a matrix consisting of second-order derivatives 
to all parameters, at the maximum likelihood points. We 
can use this matrix to approximate the posterior by a multi 
dimensional Gaussian, since for some particular models this 
approximation is quite good. The Fisher information matrix 
can be calculated in a fraction of the time needed to perform 
a full MCMC analysis. For all datasets we have plotted the 
la contour of the multi dimensional Gaussian approxima- 
tion. 

As an extra test, we have also used datas ets generated 
by th e popular pulsar timing package tempo2 (|Hobbs et al.l 
I2OO6I ) with a suitable GWB simulation plug-in (Hobbs et 
al., in preparation). We were able to generate datasets with 
exactly the same parameters as with our own algorithm, 
provided that the timing noise was white. We have confirmed 
that those datasets yield similar results when analysed with 
our algorithm. 

An important point is that that the spectral form of 
the timing noise has a large impact on the detectability of 
the GWB. For a red Lorentzian pulsar timing noise there is 
far greater degeneracy between the spectral slope and am- 
plitude in the timing residual data for the GWB than for 
white pulsar timing noise, and thus the overall signal-to- 
noise ratio is significantly reduced by the red component of 
the timing noise. 



Joint A-Y distribution, wttite noise 
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mcmc: 
mcmc: 
fisher: 




GWB Amplitude (10"'°yr"') 

Figure 2. The GW likelihood function (GW amplitude, GW 
slope vs prob. density contours), determined with the MCMC 
method for a set of mock data with 20 pulsars, and 100 data 
points per pulsar approximately evenly distributed over 5 years. 
Each pulsar has a white timing noise of 100ns. The true GW 
amplitude and slope are shown as a with an arrow, and the 
maximum likelihood values are shown as "x". The contours are 
in steps of cr, with the inner one at la. The Icr contour of the 
Gaussian approximation is also shown. 



Joint A-y distribution, Lorentzian noise 
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Figure 3. Same as in Fig. [2] but the mock data is generated 
and analysed using Lorentzian timing noise. Overall timing noise 
amplitude and characteristic frequency /o are taken to be 100ns 
and lyr~^ for each pulsar. 



search. The ensemble of maximum likelihood estimators for 
{A, 7) should be close to the true values used to generate 
the timing-residuals. 



5.2 Multiple datasets, same input parameters 

To estimate the robustness of our algorithm, we also 
perform a maximum likelihood search on many datasets: 

(a) We generate a multitude of mock timing-residual data 
for the same PTA configurations as in Sec. 15.11 with white 
timing noise. 

(b) For every one of these datasets we calculate the maxi- 
mum likelihood parameters using the conjugate directions 



The results of maximum likelihood search on many 
datasets is demonstrated in Fig. [5l The points are the maxi- 
mum likelihood values for individual datasets. It can be seen 
that the points are distributed in a shape similar, but not 
identical, to Fig. [2] some points are quite far ofl^ from the 
input parameters. In order to test the validity of the results, 
we calculate the Fisher information matrix at the maximum 
likelihood points, and show the la contour of the multi- 
dimensional Gaussian approximation based on the Fisher 
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Joint A-7 distribution, power-law noise 
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Figure 4. Same as in Fig. [2] but the mock data is generated 
and analysed using power-law timing noise. Overall timing noise 
amplitude and spectral index -ji are taken to be 100ns and 1.5 for 
each pulsar. For all -/i, a prior distribution according to Eg. 1271 



Figure 6. Density plot of the signal to noise ratio fi/a for dif- 
ferent realisations of timing-residuals. We have assumed monthly 
observations of pulsars with white timing noise of 100 ns each. 
The GWB amplitude has been set to IQ-^^yr^^^. 



Joint A-y distribution, ensemble of datasets 
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GWB Amplitude (10"'Sr"^) 

Figure 5. The maximum likelihood values for an ensemble of re- 
alisations of mock datasets, all with the same model parameters: 
100 ns white noise, 20 pulsars, and 100 data points per pulsar 
approximately evenly distributed over 5 years. The contours are 
confidence contours based on Fisher information matrix approxi- 
mations of the likelihood function. 

information matrix for three points. We wee that the error 
contours do not exclude the true values at high confidence, 
even though the Fisher matrix does not yield a perfect rep- 
resentation of the error contours (the true posterior is not 
perfectly Gaussian), and we have a posteriori selected out- 
liers for 2 of the 3 cases. 

5.3 Parameter studies 

To test the accuracy of the algorithm, and to provide sugges- 
tions for optimal PTA configurations, we conduct some pa- 
rameter studies on simplified sets of mock timing-residuals: 
(a) We generate many sets of mock timing-residuals for the 
simplified case of white pulsar timing noise spectra, all with 
the same white noise amplitude. The datasets are timing- 
residuals for some number of millisecond pulsars which are 



positioned randomly in the sky. We parametrize the GWB 
by Eq. (|32p . We then generate many sets of timing-residuals, 
varying several parameters [i.e., timing noise amplitude (as- 
sumed the same for all pulsars), duration of the experiment, 
and number of pulsars]. 

(b) For each of the mock datasets we approximate the like- 
lihood function by a Gaussian in the GWB amplitude A, 
with all other parameters fixed to their real value. We use 
A as a free parameter since it represents the strength of the 
GWB, and therefore the accuracy of ^ is a measure of the 
detectability. All other parameters are fixed to keep the com- 
putational time low, but this does result in a higher signal 
to noise ratio than is obtainable with a full MCMC analysis. 

(c) For this Gaussian approximation, we calculate the ratio 
- as an estimate for the signal to noise ratio, where /x is 
the value of A at which the hkelihood function maximizes, 
and a is the value of the standard deviation of the Gaussian 
approximation. Our results, represented as signal-to-noise 
contour plots for pairs of the input parameters, can be seen 
in Fig.[6l-Fig.[ll 

5.4 Comparison to other work 

More then a decade ago, iMcHugh et al.l l| 19961 ) used a 
Bayesian technique to produce upper limits on the GWB us- 
ing pulsar timing. We found the presentation of this work 
rather difficult to follow. Nonetheless, it is clear that the 
analysis presented here is more general than that of McHugh 
et al.: we treat the whole pulsar array, and not just a single 
pulsar; we take into account the extreme redness of the noise 
and develop the formalism to treat the systematic errors like 
quadratic spindown. 

Simultaneously with our work, a paper by Anholm et 
al. (2008, AOS) has appeared on the arxiv preprint ser- 
vice. Their approach was to construct a quadratic estimator 
(written explicitly in the frequency domain), which aims to 

^ We thank the anonymous referee for attracting our attention 
to this paper. 
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Figure 7. Density plot of the signal to noise ratio ix/a for dif- 
ferent realisations of timing-residuals. We have assumed 100 data 
points per pulsars, approximately evenly distributed over a period 
of 7.5 years. The GWB amplitude has been set to IQ-^^yri/^. 
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Figure 8. Density plot of the signal to noise ratio /i/a for dif- 
ferent realisations of timing-residuals. We have used a constant 
GWB amplitude of IQ-i^yri/^ and 20 pulsars. 

be optimally sensitive to the GWB. This improves on the 
original non-quadratic estimator of J05. However, a num- 
ber of issues important for the pulsar timing experiment re- 
mained unaddressed, the most important among them the 
extreme redness of the GWB and the need to subtract con- 
sistently the quadratic spindown. 



6 CONCLUSION 

In this paper wc have introduced a practical Bayesian algo- 
rithm for measuring the GWB using Pulsar Timing Arrays. 
Several attractive features of the algorithm should make it 
useful to the PTA community: 

(1) the ability to simultaneously measure the amplitude and 

slope of GWB, 

(2) its ability to deal with unevenly sampled datasets, and 

(3) its ability to treat systematic contributions of known 
functional form. Prom the theoretical point of view, the al- 
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Figure 9. Density plot of the signal to noise ratio ^ for different 
realisations of timing-residuals. We have used 20 pulsars with 
white pulsar timing noise levels of 100 ns each, with monthly 
observations. The GWB amplitude has been set to 10" 
The points and error bars are the mean and standard deviation 
of 10 realisations. 
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Figure 10. Plot of one over the signal to noise ratio (/i/cr)"^ with 
respect to the pulsar timing noise for an experiment of 5 years, 
20 pulsars and monthly observations. The GWB amplitude has 
been set to 10" ^^yr^/^. 

gorithm is guaranteed to extract information optimally, pro- 
vided that our parametrization of the timing noise is correct. 

Test runs of our algorithm have shown that the experi- 
ments signal-to-noise (S /N) ratio strongly decreases with the 
redness of the pulsar timing noise, and strongly increases 
with the duration of the PTA experiment. We have also 
charted the S/N dependence on the number of well-clocked 
pulsars and the level of their timing noise. These charts 
should be helpful in the design of the optimal strategy for 
future PTA observations. 
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Figure 11. Plot of the signal to noise ratio ^ with respect to 
the number of observed pulsars. The white timing noise of eax;h 
pulsar has been set to 100 ns and the observations were taking 
every 2 months for a period of 7.5 years. The GWB amplitude 
has been set to 10^^'''yr^/^. The points and error bars are the 
mean and standard deviation of 10 realisations. 
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Figure 12. Several plots of the signal to noise ratio ^ with re- 
spect to the level of the GWB amplitude. The number of pulsars 
was set at 20, with bi-weekly observations for a period of 5 years. 
The pulsar noise levels were set at 50, 100, 200, 500, 1000 ns for 
the different plots. The points and error bars are the mean and 
standard deviation of 10 realisations. 
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APPENDIX A 

In this Appendix we show exphcitly how to perform 
marginalization over the nuisance parameters ^ in Eq. (|16p . 
rewritten here for convenience: 



M 



VdetC 
xPo(a,C) 



exp 



-{5x F^C^iSx ~ FO 



(33) 



From here on we will assume that Po is independent of ^ 
(a flat prior). All values are therefore equally likely for all 
elements of ^ prior to the observations. This assumption is 
also implicitly made in the frequentist approach when fitting 
for these kinds of parameters as is done in popular pulsar 
timing packages. We now perform the marginalisation: 



P{cr\Sx) 



P(a,Cifc)d™C, 



(34) 



where m is the dimensionality of of 5. The idea now is to 
rewrite the the exponent E of Eq. (|33|l in such a way that 
we can perform a Gaussian integral with respect to ^ (we 
have to get rid of the F in front of 5). Therefore, we will 
expand E and complete the square with respect to ^: 

i T 



E = (5x-F^ C'^(5x-F^ 

= Sx^C'^Sx - 2i^F'^C'^Sx + i^F^C'^F^ 
= Sx'^C-'sl + (e - xfF^C'F (C - x) 

-X^F^C-'Fx, (35) 

where we have used the substitution: 

X = (F'^C-'p) F'^C-'^Sx. (36) 

Using this, we can write the ^ dependent part of the integral 
of Eq. (|34|l as a multi dimensional Gaussian integral: 

/ = /exp(^(C-x)"p"C-V(r-x))d™C 

= (27r)'" det {F'^C-^Py' . (37) 



From this it follows that: 



P{a\Sx) = 



M' 



^det(C) det(FTc-iir) 



(38) 



X exp 



- — (5a; ■ C'Sx 



where we have absorbed all constant terms in the normali- 
sation constant M' and where we have used: 
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